Building a Collocational Semantic Lexicon

نویسنده

  • David Hardcastle
چکیده

Natural Language Generation (NLG) systems require access to collocational information to help determine lexical choices constrained both by syntactic and semantic concerns. Constructing linguistic resources to support these decisions can be time-consuming whereas, if the information is extracted automatically, data sparsity limits the variety of the output. This paper reports on a method for extracting collocational data from the British National Corpus, and then generalizing it using WordNet to tackle the sparsity problem. The method is evaluated using the lexical choice component of ENIGMA, an NLG system that generates cryptic crossword clues.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling bilingual word associations as connected monolingual networks

Word associations are a common tool in research on the mental lexicon. Studies report that bilinguals produce different word associations in their non-native language than monolinguals, and propose at least three mechanisms responsible for this difference: bilinguals may rely on their native associations (through translation), on collocational patterns, and on the phonological similarity betwee...

متن کامل

Improving Lexical Databases with Collocational Information: Data from Portuguese

This article focuses on ongoing work done for Portuguese concerning the phenomenon of lexical co-occurrence known as collocation (cf. Cruse, 1986, inter al.). Instances of the syntactic variety formed by noun plus adjective have been especially observed. Collocational instances are not lexical entries, and thus should not be stored in the lexicon as multiword lexical units. Their processing can...

متن کامل

The Generation of Idiomatic and Collocational Expressions

Collocations whose semantic content is not or only partially composed from the semantic content of their parts are often viewed as problematic for generation. In this paper a tactical generator combining FUF as the generation engine and HPSG as the grammar framework is presented. It is shown, that the lexicon driven approach to syntactic and semantic processing is well-suited for the generation...

متن کامل

Co-Occurrrence Patterns among Collocations: A Tool for Corpus-Based Lexical Knowledge Acquisition

One of the main problems for applied natural language processing is gaps in the lexicon, including missing words and word senses, and inadequate descriptions of word use in context. Traditional lexicography has similar concerns. The availability of large, on-line text corpora provides a straightforward tool for enlarging the stock of words included in a lexicon. The identification of additional...

متن کامل

Lexical Functions And Machine Translation

This paper discusses the lexicographical concept of lexical functions (Mel'~uk and Zolkovsky, 1984) and their potential exploitation in the development of a machine translation lexicon designed to handle collocations. We show how lexical functions can be thought to reflect cross-linguistic meaning concepts for collocational structures and their translational equivalents, and therefore suggest t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007